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ABSTRACT 

Amino acid sequences of 2 giant non-structural 
polyproteins CF1 and F2) of infectious bronchitis virus (IBV), 
a aeeber of Coronaviridae, were compared, by cosputer-assisted 
methods, to sequences of a number of other positive strand RNA 
viral and cellular proteins. By this approach, juxtaposed 
putative RNA-dependent RNA polymerase, nucleic acid binding 
("finger"-like> and RNA helicsse domains were identified in F2. 
Together, these domains might constitute the core of the 
protein complex involved in the primer-dependent transcription, 
replication and recombination of coronsviruses. In FI, two 
cysteine protease-like domains and a growth factor-like one 
were revealed. One of the putative proteaaea of IBV is similar 
to 3C proteases of picornaviruses and related enzymes of como-, 
nepo- and potyviruses. Search of IBV FI and F2 sequences for 
sites similar to those cleaved by the latter proteases and 
intercomparison of the surrounding sequence stretches revealed 
13 dipeptidea Q/S(G> which are probably cleaved by the 
coronavirus 3C-like protease. Based on these observations, a 
partial tentative scheme for the functional organization and 
expression strategy of the non-structural polyproteins of IBV 
was proposed. It implies that, despite the general simiiarity 
to other positive strand RNA viruses, and particularly to 
potyvirusea, coronavirusea possess a number of unique 
structural and functional features. 


INTRODUCTION 

Coronaviruses are enveloped positive strand RNA viruses 
having by for the largest genome in this virus class (1-3). 
Recently, the genome sequence of the type member of 
Coronaviridae, avian infectious bronchitis virus (IBV), has 
been completed (4). The total length of IBV genome is 27 608 
nucleotides, excluding 3'-terminal poly(A). Of these, about 
8 OOO nucleotides at the 3'-end are dedicated to coding virion 
and some small non-structural proteins, expressed as a nested 
set of 3'co-terminal mRNAs, with only the 5'-terminal "unique" 
part probably translated in each (2). The 5'-terminal part of 
genomic RNA (approx. 20 OOO nucleotides) contains two large 
ORFs, potentially encoding two non-structural polypeptides <F1 
and F2> of 441 and 300 kD, respectively. As no subgenomic mRNA 
corresponding to the F2 polypeptide has been detected, it was 
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suggested that the two ORFs are expressed as a single grant 
polyprotein, via ribosome frame-shifting (4). Subsequently, 
experimental evidence has been obtained corroborating this 
hypothesis <5>. 

Functional organization of the F1-F2 polyprotem of IBV 
remained, until very recently, completely obscure. Only a short 
region of F2 has been shown to possess a considerable 
similarity to non-structural proteins of alphaviruses and 
certain plant viruses C4). We demonstrated that this segment in 
fact comprised a port of a domain containing an NTP-binding 
sequence motif and belonging to a vast auperfamily of positive 
strand RNA viral proteins in which this motif is the most 
conserved sequence (6). Moreover, it has been shown that one of 
the three protein families constituting this superfamily, the 
IBV domain included, possessed highly significant sequence 
similarity to DNA helicases C7-9>. We suggested that proteins 
of this family could be RNA helicaaea involved in duplex 
unwinding during viral RNA replication <7,8). Encouraged by 
these observations, we performed a systematic search of the 
sequences of the large non-structural polypeptides of IBV for 
sequence stretches similar to highly conserved proteins of 
positive strand RNA viruses and to certain cellular proteins. 
Here we report the results of this study and discuss 
implications for functional organization and expression 
strategy of IBV genome. 


METHODS 

Am'lno acid sequence comparisons 

Amino acid sequences were from current literature; for 
abbreviations and references see legends to figures. 

Comparisons were done by programs MULDI (MULtiple DIagon) and 
OPTAL <OPTiraal ALignment). Program MULDI is a modification of 
standard DIAGON (lO) designed to reveal highly conserved 
segments in amino acid sequences. Groups of aligned amino acid 
sequences ore compared in a diagonal plot, utilizing the MDM78 
amino acid residue comparison matrix <10). What results, may be 
considered a superposition of several pairwise local similarity 
maps in which only streaks corresponding to highly conserved 
segments ore filtered out. MULDI ia principally similar to the 
program recently described by Argos (11). Program OPTAL <6, 

12), based on the original algorithm of Sankoff (13), performs 
stepwise optimal alignment of multiple amino acid sequences and 
its statistical assessment by a Monte Carlo procedure. Adjusted 
alignment score is calculated in standard deviation (SD> units: 
AS = So-Sr/Q* where S Q is the score obtained 
for a given comparison utilizing MDM78 scoring matrix, S r 
is the mean score obtained upon intercomparison of 25 randomly 
jumbled sequences (or sequence sets) identical to the real ones 
in amino acid composition, and (& is the standard deviation. The 
programs were written in FORTRAN77 and run on a ES-1060 computer. 
The statistical significance of manual alignments was assessed 
by program SCORE. Average per residue score was computed 
for a query sequence versus a group of aligned sequences and AS 
was calculated by the above equation uaing 300 randomly 
scrambled versiona of the query sequence (E.V.K. at al. 
in preparation). The probebilltiy of chance similarity between 
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two sequences aligned without gaps ('double Hatching 
probability'! was calculated using the algorithm of NcLachlen 
(14). 


RESULTS AND DISCUSSION 

Approach 

As the first step to Identification of functional domains 
in coronavirus polyproteins, it was natural to try to find 
coronaviral counterparts of the moat highly conserved proteins 
of positive strand RNA viruses. Such proteins are, in the order 
of decreasing conservation: 1) RNA-dependent RNA polymerases 
present in all viruses of this class and always having a 
similar central segment (15,16); ii> NTP-binding 
motif-containing proteins involved In RNA replication some of 
which are similar to helicases; proteins of this type were 
identified in all eukaryotic positive strand RNA viruses whose 
genome lengths exceed 6.3 kb t(6-9) and manuscript in 
preparation!; iii) 3C proteases of picornaviruses and similar 
enzymes revealed in cono-, nepo- and potyviruses (17-23). 
Clearly, at least for the first and the second groups of 
enzymes, the case for existence of coronaviral homologs seemed 
quite strong. 

Alignments of conserved fragments of these three groups of 
viral proteins were used as probes to screen sequences of F*1 
and F2 polypeptides of IBV by program MULD1. Segments of these 
proteins best matching the probes were fitted into respective 
alignments by program OPTAL (or visually) and the significance 
of the observed similarity was correspondingly assessed. 
Additional search by the sane procedure was mode for segments 
of coronaviral proteins similar to different classes of 
cellular proteases and to certain other sequence motifs 
conserved in cellular proteins. Identification of the putative 
helicese was described previously (see Introduction); other 
results are presented below. 

RNA-dependent RNA polymerase 

In F2 polypeptide two segments similar to the two most 
conserved sequence blocks of (putative) positive strand RNA 
viral RNA polymerases were detected. Inspection of the 
neighboring regions of F2 revealed also putative counterparts 
of other conserved stretches of polymerases. As can be seen in 
the resulting alignment (Fig.l), this port of F2 contained all 
the amino acid residues invariant in other viral polymerases, 
except one, as well as many partially conserved residues. A 
notable exception is the substitution of S for G in the so 
called GDD site considered to be the most characteristic 
sequence of positive strand RNA viral RNA polymerases (15,16, 
22). Presumably, it was this substitution that prevented other 
investigators from identification of the IBV polymerase. 
Evaluation of the alignment of the 4 picked segments of F2 with 
the conserved segments of 40 (putative) polymerases of positive 
strand RNA viruses by program SCORE showed significance at the 
9.2 SD level. Lengths of variable spacers separating conserved 
fragments in the putative polymerase of IBV are generally 
within the limits set by other polymerases although the 
coronavirus one appears to be among the longest. Unexpectedly, 
a 19 amino acid residue segment of F2 has been shown to possess 
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MS2 

: 231 

PV 

i 212 

HAV 

i 220 

CPMV 

t 252 

YFV 

i 410 

SNBV 

i 349 

TMV 

i 239 

BMV 

i 438 

BSMV 

i SOB 

CarMV 

i 201 

BBV 

i 562 

PPV 

t 222 

TEV 

i 222 

TVMV 

1222 

IBV 

i 597 


6idlndqsiNQrLaQqgsvdg--sLatiDlssasd»I*DrLvws{ 

Gcdp-dlfWsk IpvlMeEk-Lf a<DYtgyDa»Lspan<eftl 

Sidp-drqWDELfKtMIrfgD—VgLdlDFsafDa«LspfMlreA 
GinpysoeWsrlaarMkEkgN—dVLccDY*tfDgLL»kqVadvi 
6ig1qylGYvirdlaAMDSg—gfyadDtag«DtrItEadldde 
lfdasaedFDalIa£hFkqgD--pVL«tDiaif DKaqdDaMaLtg 
rkTp-aqieDfFgd1dshvpn--dVL»l Oi aKyDKsqnEfheAve 
fivpigkiesleLKNVrlnnr--yfLeaDlB(!fDKfqgELhltfq 
hmTa-delnEtVafltphk-y--raLeiDFsKfDKsktgLhikAv 
GyTteevAqhiwsaunqiqtp—VaI6FD»sRfDqhVsvaalefe 
Grnp-teiaDgVcefVgEc da--eVIetDF«nlDqrVttwHqrni 


GoTKFrGGHDkLLRaLpEE »-lycdaDgtqfDt»LspyLinAv 

GaTKFYQGWNELMsaLpst m—V ycdaDgtqfD»«LtpfLinAv 

GoTKFYGGWNELLgkLpDGw-VycdaDgsqf0»sLspyLinAv 

i lilt ii ill 

6tTKFY66WDNHLRNU0GtvEdpILH6MDYpKcDR«HpNLlrIAA 


18 vdCetirwel 
25 yknktYcvkG 

27 1ynccYhvCG 

30 ckntVNrvec 
37 AyedVlirrd 
29 pT6trFkfga 
29 taGiktciwy 

28 hakvgesvsf 

29 nfGleaylly 

31 ngalrYtKeG 

32 rfGfrYepgv 

33 pd6tlvkKf k 

33 pdGtlikKhk 

34 pdGtlvkKfk 

i 

33 ATGgIYvKpG 


HS2 i fSTm6NgfTfelESMifwaivkatQIhfq 
PV i GepSGcsgTsifNSMiNnLiirTllLkty 
HAV i BmpS6spcTAllNSIiNnlNlyyvfskif 
CPMV : gipSGfpmTvivNSIFNelliryhykklM 
YFV i qrgSSQvvTyalNTItNlkvqliroaeae 
SNBV : onkSGmflT1FvNTVlNVViASrv1eeRL 
TMV i qrkSGDvTTfiGNTViiaadaSalpaek 
BMV i qrrTGDAfTyFGNTLvtManiayABdl sd 
BSMV i qqkSGNcdTygsNTwsaaLalldclpled 
Car MV: crnSGDimTAlGNcLlaclitkhlekiRs 
BBV i GvkSGssTTtphNTqYNgcveiTAltfeh 
PPV i GnnSGQpSTvvdNTLnvILaaTysllklg 
TEV i GnnSGQpSTvvdNTLavIIaalytcekcg 
TVMV i GnnSGQpSTvvdNTLavVLasyyAlsklg 
ill : it: i 

IBV i GTSSSDATTAYANSVFNIIQATSANVaRL 


*** 

3 TIglygDDilcp 25 sglfrEsCgaHfyrg 160 
8 kHUygDDvIAs 34 tmenvtFlkrffrAd 77 

11 rILcygDDvLIv 38 pvseltF1krsfnLV 84 
16 gLVtygDDnLIs 39 rleecDFlkrtfVqr 281 
32 rMaVsgDDcVVr 33 DwenvpFCShHfheL 184 

4 rHaVsgDDcVVr 27 gerPpyFCggfiLqd 97 
2 caaflgDDnllh 25 kKqygyFCgryvIhh 92 
2 calfsgDDsLIi 23 DpsvpyvCSkf1 Vet 220 
2 hfcVggDDsLLy 25 DfkypaFCgkf1Lcl 103 
0 rLlnngDDcVli 31 EaekirFCqnapVfd 146 

11 igpkcgDDGLsr 24 peiglcFISrvfVdp 150 
10 ryfVngDDlVia 30 NKeelwFaShkgVLy 116 
6 vyyVngDDlLla 30 DKtqlnFmShraler 114 
10 kffangDDHIa 30 OKkelnFeShralsk 114 
i 11 l i i ii i 

46 SLMIlsDDGVVc 40 EKgPhEFCSqHtMLV 112 


Fig.l. Alignment- of a fragment of putative RNA-dependent RNA 
polymerase of IBV with evolutionary conserved fragments of 
selected (putative) polynerases of other positive strand RNA 
viruses. 

The sampling of the (putative) polymerases was compiled so as to 
represent the main groups of positive strand RNA viruses and the 
entire range of sequence variability of this protein (cf.16). 
Abbreviations: MS2, MS2 bacteriophage; PV, poliovirus type 1, 
HAV, hepatitis A virus (plcornoviruses); CPMV, covpea mosaic 
virus (a comovirus); YFV, yellow fever virus (a flavivirus); 
SNBV, Sindbis virus (an alphavirus) ;* TMV, tobacco mosaic virus 
(a tobamovirus); BMV, brome mosalv virus (a tricornavirus); 

BSMV, barley stripe mosaic virus (a horde!virus> ; CarMV, 
carnation mottle virus; BBV, black beetle virus (a nodavirus); 
PPV, plum pox virus, TEV, tobacco etch virus, TVMV, tobacco vein 
mottling virus (potyviruses). The lengths of the terminal 
regions and of the variable spacers separating the conserved 
segments are designated by numbers. For IBV, the boundaries of 
the polymerase were predicted from analysis of the putative 
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a very remarkable similarity to a segment of RNA polymerases of 
potyviruses which is relatively variable among positive strand 
RNA viruses in general (Fig.l). For this segment, the 
similarity between IBV and the potyviruses is comparable to 
that between potyviruses themselves, and unprecedented for 
positive strand RNA viruses of different families. Taken 
together, these observations strongly suggest that the 
pinpointed region of F2 is the core domain of IBV RNA-dependent 
RNA polymerase. As for the aforementioned substitution in the 
'GDD box', it is relatively conservative in nature and, more 
importantly, includes a residue which obviously plays a 
structural, rather than catalytic, role. It is perhaps relevant 
that polymerases of NS2 and related phages, for which the 
activity had been firmly established, also bear a substitution 
of an otherwise conserved residue, i.e. Glu for Asn <cf. Fig.l)-. 

Two types of RNA-synthesizing complexes greatly differing 
with respect to enzymatic properties and products synthesized 
were isolated from coronavirus-infected cells (24). Also, 
coronaviruaes are known to have a unique mechanism of 
subgenomic RNA synthesis quite distinct from that of genome 
replication (3). Thus, it is not unlikely that IBV could have 
more than one RNA polymerase. However, our search did not 
reveal any segments of FI or F2 significantly similar to 
viral polymerases except that shown in Fig.l; though some 
sequences of marginal similarity could be detected in 
C-terminal parts of both polyproteins. Thus, if IBV genome 
encodes a 2nd RNA polymerase, its sequence should be very 
different from those of other positive strand viral polymerases. 
3C-like protease 

In FI polypeptide, sequence stretches similar to all three 
conserved segments of 3C-like proteases (19) were detected. 
Alignment of a 188 residue piece of FI with 14 viral proteases 
proved to be significant at the 5.7 SD level. Notably, His, 
Asp(Glu) and Cys residues conserved in 3C-like proteases and 
thought to constitute their catalytic triad (19) were 
Identified also in the coronavirus sequence (Fig.2). The 
putative coronavirus protease contains one replacement of a 
residue invariant in other 3C-like proteases. This is the 
substitution of Tyr for Gly in the sequence GXH in the vicinity 
of the proposed catalytic Cys residue (Fig.2). It is notable 
that, just like the replacement in the“putative polymerase 


Fig.l legend cont. 

cleavage sites (see text and Fig.6); the sequence shown is 
residues 549 to 780 of the F2 polypeptide (4>. The PPV sequence 
is from (39), and the BSHV one from (40). For sources of the 
other sequences see (16). Capitals: residues identical or 
similar to respective residues-of IBV; colons: positions where 
residues identical or similar to those of IBV are observed in 
more than a half of included sequences. Residues belonging to 
one of the following groups were regarded similar: L,I,V,M,; A, 
G; S,T; D,E,N,U; K,R; F,Y,W. Asterisks: consensus residues of 
positive strand RNA viral polymerases (IS,16,22). Boxed: region 
of high local similarity between putative polymerases of IBV and 
potyviruses. 
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Ref. 

PV (41): 24 
HRV (42): 24 
EMCV (43): 32 
FMOV (44): 32 
HAV (45): 32 
CPMV (46): 24 
TBRV (21): 22 
TEV (331:217 


i < * 

FtMlGV-hdNvailPtH-29-lElTiitlkrnE-62-AGqCGg-vitct-G—kvigMH-Vgg 
f tglGV-ydrf VvvPtH-29-1ElTvlkldrnE-62-s6yCGg-vlyki-G---QvlglH-Vgg 
Qtcl1V-rGrTLvvnRH-32-tDVSfir16Sgp-65-kGxCSSal1adl-Ggskki1glH-sag 
ccatGV-fGtaylvPRH-36-sOaalnvlHrgN-65-AGyCGgavlakD-GadtfivgtH-sag 
fflNalGV-kdDwLlvPsH-38-qDVvlakvpTIp-74-p6mCGGslvssNqsIqnai1glH-Vag 
lQIvaV-pGrrf1 acKH-34-sELvl yssipSLE-71-pedCGSlvi ahi-Ggkhki vgVH-Vag 
vsafflqy-knkSVrotRH-36-sEIvTwlApSLp-73-nddC6mIilcqi-kgk»rvvgMl-Vag 
tsLyGIgfGpfIitnKH-34-rDMiiirnpkd--56-dGqC6SplvstrdG—fivgIHsasn 


19 

19 

25 

29 

23 

21 

19 

71 


IBV (4): 24 NNLnGLwLGDTIycPRH-21-fEVTTqhGVTLN-65-AGaCGSVgfniEkGVv-NffyMHhLel 142 


Fig.2. Alignment o£ a fragment of putative 3C-like proteaae of 
IBV with conserved fragments of selected cysteine proteases of 
other positive strand RNA viruses. 

The representative sampling of (putative) proteases was 
generated as indicated in the legend to Fig.l. Additional 
abbreviations: HRV, human rhinovirus type 2; EMCV, 
encephalomyocarditia virus, FHDV, foot-and-mouth disease virus 
(picornaviruaes); TBRV, tomato black ring virus (a nepovirus). 
The boundaries of the putative protease of IBV were predicted os 
indicated in the legend to Fig.l; the sequence shown is residues 
2804 to 2845 of the FI polypeptide (4). Source references for 
the other sequences ore given in parentheses before each 
sequence. Asterisks: putative catalytic residues: other 
designations as in Fig.l. 


SP QPVVKSLLDSKGIHYNBGNPYNLL.TPVJEKVKPGEBGFVGOAATGHCVATATABIMKYHNYPDKGLK 

. . 11 11 ■ . i in .in ii.. ii 

IBV SNCPTCGANNTDEVIEASLPYLLLFATDGPATVDCOEDAVGTVVFVGSTNSGHCYTBAAGBAFDNLAKDRKFGK 


SP NYTYTLSSNPDYFDHPKNLFAAISTRBYDWNNILPTYS-GRQSONVKMAISELMADVGISVDHDY6PSSGS 

i . .1 .iii.i 

IBV KSPYITAHYTRFAFKNETSIPVAKQSKSK5KSVKEDVSNLATSSKASFDNLTDFEDHYDSNIYESLKVQESPDN 


SP AG.SSRVQRALKENF6YN8SVH8INRGDFSKBDHEABIDKELSQNBPVYY-.-.EGVGK-V 

. .. 1 . 11 11 1 1 . 1 

IBV FDKYVSFTTKEDSKIPITLKVRGIKSVVDFRSKDGFIYKLTPDTDENSKAPVYYPVLDAISLKAIHVEGNANFV 

* 

SP GGHA-FVIDD-GAGRNFYHVDW6WGGVSDGFFRLDALNPSALSTGGGASSFNGYESAVVGIKP 

ii. .11. 111. i..I .*11 : ill 

IBV V6HPNYYSKSIHIPTFNENAENFVKHGDKI6SVTM8U(RAEHLNKPNLERIFNIAKKAIV6SSVVTT8C 


Fig.3. Alignment of the putative second cysteine protease 
domain of IBV with the protease of Streptococcus pneumoniae. 

The IBV sequence is residues 1385 to 1677 of the FI polypeptide 
(4). The S. pneumoniae protease sequence was from (47). Colons: 
identical residues; dots: similar residues; asterisks: putative 
catalytic residues. The alignment generated by program OPTAL 
(see Methods) was slightly corrected to improve local 
similarity around the catalytic His residue of the bacterial 
protease and the corresponding residue of IBV. 
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factor VII 
factor IX 
II factor X 
prC 
prZ 


2 3 


4 5 


epCqngggC kDql-qsYiCfClp 

npCLnggsC-kDdi-nsYeCwCpf 

spCqnQgkC-kDgl-geYtCtCle 

eICcghgtC-iDgi-gsFsCDCrS 

qpCLnNgsC-qDst-IGYACtCap 


I 


uPA 

tPA 

vaccinia 

TGF 

EOF 


--CLnggtCvSnkyfa-nihwCNCpk 
prCfnggtCqqqlyfs-dfv-CQCpe 
19K GYCLhgd-CiharDid-gmY-CrCch 
qFC-fhgtC-rflvqe-dkpACvChS 
GYCLnggVC-mhiEld-ssYtCNCvi 


IBV FI 


GFCLrNkVC-TVCQcw-IGYGCQCDS 


III LDL R exon 7 
LDL R exon 8 


—CLdNggCshVCNdlkIGYeCICpd 
—CqdpddCsqLCpdlegGYkCQCEe 
2 3 1 4 5 


Fig.4. Alignment of a cysteine-rich segment of the FI 
polypeptide of IBV with receptor-binding dosains. 

The IBV sequence was fro* residue 3894 to 3917 (4>. 

For sources of the other sequences see (25). Abbreviations: 
factors VII-X, respective human coagulation factors; prC, human 
plasma protein C; prZ, human plasma protein Z; uPA, 
urokinase-type plasminogen activator; tPA, tissue-type 
plasminogen activator; vaccinia 19K, growth factor-like protein 
of vaccinia virus; TGF, transforming growth factor; EGF, 
epidermal growth factor; LDL R, low density lypoprotein 
receptor. The grouping of the EGF-llke domains and the 
numbering of Cys residues is according to (25). Disulfide bonds 
Cys 1-3, Cys 2-4, Cys 5-6 ore expected to form but Cys 6 having 
no counterpart in the IBV sequence is not shown. Other 
designations as in Fig.l. 


discussed above, this one includes a Gly residue which cannot 
be directly involved in catalysis. Another conserved Gly 
residue is substituted by Glu in the CPMV protease, the activity 
of which was determined in unequivocal experiments (cf. 23). 

2nd cysteine protease 

Upon comparison of the sequences of FI ond F2 with those 
of cellular proteases, a segment of FI has been revealed 
remarkably similar to a fragment of the catalytic center of 
Streptococcus pneumoniae cysteine protease. Alignment of the 
respective portion of FI with this protease (Fig.3).is 
significant at approx. 5 SD level. The two most prominent 
regions of similarity (N- and C-terminal) include segments of 
the bacterial protease around the catalytic Cys and His 
residues. Corresponding residues could be identified in IBV , 
emphasizing the possibility that this segment of FI could be an 
authentic protease. 

Cvstelne-rich segments 

An interesting feature of FI and F2 polypeptides is the 
presence of several segments with anomalously high content of 
Cys residues. One of these segments resides in the C-terminal 
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Fig.5. A model of possible organization of the putative 
metal-binding ("finger") domain of the F2 polypeptide of IBV. 
Amino acid residue numbering is indicated. Alternative 
configurations involving other pairs of Cys and His residues 
are also possible. M, metal (probably Zn 2 *> cation. 

Highlighted: similar sequence stretches adjacent to putative 
metal-binding residues; aromatic residues conserved in 
TFIIIA-like fingers. 

part of FI. It was shown to be significantly similar to the 
receptor-binding site of murine epidermal growth factor 
(probability of fortuitous similarity approx. 10~ 10 >. 

Recently EGF-like domains have been divided into three groups 
differing in cystein residues arrangement and the lengths of 
spacer segments (25). While bearing the moat significant 
similarity to group 1 domains (EGF, uPA etc.), the IBV domain 
contains counterparts to only 4 of 6 Cys residues (residues 2-5 
in Fig.4) which are highly conserved within this group and are 
thought to form three disulfide bonds. On the other hand, one 
of the additional Cys residues present in the IBV sequence 
could be aligned with Cys 1 of the group 3 domains (LDL R and 
some other) to which the IBV domain is also considerably 
similar (Fig.4). Cys 6, however, is absent from this domain. It 
may be speculated that disulfide bonding might occur between 
Cys 5 and some more distant Cys residue; several such residues 
ore available in FI to the N-side of the EGF-like domain. Thus, 
IBV appears to possess a novel type of EGF-like domain. 

Another cysteine-rich segment lies in F2, between the 
putative RNA polymerase and the RNA helicase. This 30 residue 
stretch contains 9 Cys and 4 His residues, conforming to the 
formula of the so called "finger” Zn 2 *-binding motif 
(C-X 2 - 4 ~C-X 2 -l 5 -a- X2-4 - a where a is C or H, 

and X is any amino acid residue) characteristic of numerous 
DNA- and RNA-binding proteins (26-28). It is potentially 
capable of forming three "fingers" supported by Cys and His 
residues which might tetrahedrally coordinate Zn 2< ' 
cations (Fig.S), suggesting classification as a class I (i.e. 
multi-finger) domain (28). No general consensus for finger 
domains beyond the (putative) metal-binding Cys and His 
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PUTATIVE CLEAVAGE SITES 


PROTEIN 

FUNCTION 



-7- 

-6- 

-5- 

-4- 

-3- 

2- 

-1*1*2 

►3 

*4- 

►5*6*7*8- 

-9 

NC 

Coordinates 


No 


















(Q3 


3 

I 

G 

V 

s 

R 

L 

Q/S 

G 

F 

K 

K 

L 

V 

S 

P 


FI 

2779 

MP1 

4 

I 

G 

G 

V 

R 

L 

a/s 

S 

F 

V 

R 

K 

A 

T 

S 


FI 

3086 

3CL 

10 

R 

A 

P 

T 

T 

L 

Q/S 

C 

G 

V 

C 

V 

V 

C 

N 


F2 

891 

POL 

11 

D 

S 

E 

T 

S 

L 

Q/G 

T 

G 

L 

F 

K 

i 

C 

N 


F2 

1492 

HEL 

1 

a 

d 

V 

L 

R 

f 

Q/S 

A 

r 

V 

i 

a 

e 

d 

V 

7 

FI 

440 


2 

a 

m 

V 

I 

K 

f 

Q/G 

V 

F 

K 

a 

y 

A 

T 

T 

io 

FI 

2583 


5 

t 

A 

f 

k 

c 

V 

Q/G 

C 

Y 

H 

n 

s 

f 

n 

T 

8 

FI 

3214 

MP2 

6 

s 

T 

N 

I 

1 

I 

Q/G 

i 

G 

g 

d 

R 

V 

1 

P 

IO 

FI 

3365 


7 

K 

r 

8 

T 

V 

L 

Q/S 

v 

t 

q 

e 

f 

s 

h 

i 

5 

FI 

3462 


a 

s 

n 

V 

V 

V 

L 

Q/S 

k 

G 

h 

e 

t 

e 

e 

V 

6 

FI 

3784 


9 

0 

P 

k 

S 

S 

V 

Q/S 

v 

A 

9 

a 

s 

d 

f 

D 

8 

FI 

3928 

GFL 

12 

K 

S 

f 

s 

a 

L 

Q/S 

i 

d 

n 

i 

a 

y 

n 

a 

6 

F2 

2012 


13 

t 

c 

y 

p 

<1 

L 

Q/S 

A 

W 

t 

C 

g 

y 

n 

m 

6 

F2 

2350 


Fig 

.6. 

Putative 

cleavage 

aitea 

in 

FI 

and F2 

polyproteins 

of 


The sites are numbered beginning from the N-terminus of FI. The 
4 aitea which were identified firat (3, 4, 10 and 11> and 
conatituted the reference aet for identification of the other 


putative aitea are ahown in the upper 4 rowa. In the other 
aequencea capitala highlight reaiduea having identical or 
hoaologoua counterparta in at leaat one of the aequencea of the 
reference aet. NC: number of reaiduea having counterparta in 
the reference aequencea. NP1, HP2, putative membrane proteina; 
POL, putative RNA-dependent RNA polymeraae; HEL, putative RNA 
helicaae; 3CL, putative 3C-like proteaae; GFL, growth 
factor-like domain. In the 'protein function' column proteina 
are indicated whoae C-terminua may be flanked by the given aite. 


residues can be derived <26-283, and the putative finger domain 
of IBV does not appear to bear significant sequence similarity 
to any particular finger domain of other proteins. 

Specifically, it does not contain a more strict consensus 
typical of classical TFIIIA-like fingers <293, although two of 
the residues thought to be important for proper folding of the 
latter are present (highlighted in Fig.53. Nevertheless, the 
conservation of the typical ’■polarity” of finger domains, with 
the N-terminal pair of consensus residues represented by 
Cys 2 , and the C-terminal pair by any possible combination 
of Cys and His, in all the three coronavirus fingers is 
notable. Also of interest is the similarity between short 
sequence stretches adjacent to some of the candidate 
metal-binding residues (Fig.5). Moreover, two of these 
stretches flanking the Cy's residues from the N-side strikingly 
resembled respective sequences in the finger domains of yeast 
transcription activator ADR1 <30; data not shown!. Thus, 
whereas the finger-like structures of IBV may not be close 
structural analogs of TFIIIA-like fingers <cf.29>, it seems 
likely that they constitute on authentic metal-binding and 
nucleic acid-binding domain. 
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Putative cleavage ait.es 

We have tentatively identified two proteaae domains in FI 
polypeptide of IBV. Of these, the cleavage specificities of 
3C-like proteases have been studied in considerable detail (for 
reviews see Refs. 31,32). They primarily cleave at dipeptides 
Q,E/G,S,A. Cleavage occurs selectively and, unfortunately, the 
requirenents for a site to be utilized are not fully 
understood, probably differing considerably in different 
viruses. Nevertheless, in potyviruses a clear consensus (though 
unique for each virus) for the sequences flanking the cleavage 
sites has been derived (20,33). This encouraged comparison of 
the sequence stretches centering at Q/S,G dipeptidea in the 
polyproteins of IBV. At the first step, we compared those sites 
which could flank the putative protease, polymerase and 
helicase domains. We observed that the distances between highly 
conserved sequence stretches and protein termini vary to a 
rather limited extent in most enzymes of each class (Figs.1,2 
and data not shown). Thus, three Q/S and one Q/G site were 
identified in the respective regions of the IBV polyproteins, 

1. e. sites 3, 4, 10 and 11 in Fig.6. Sites 3 and 4 flank the 
putative protease, and sites 10 and 11 the putative helicase, 
site 10 being also the probable C-ter»inua of the polymerase; 
the site flanking the polymerase from the N-side was less 
easily determined (see below). Sequences around these 4 sites 
bear considerable similarity to each other. Especially 
pronounced is the similarity between consecutive sites 
delineating each domain. It could be calculated that the 
probability of the similarity between sites 3 and 4 being 
fortuitous was about 10“ & , and for sites 10 and 11 about 
10“^. It could be shown that the similarity within these 

two pairs was most prominent among all sequence stretches 
surrounding O/G.S dipeptides in FI and F2. 

Based on these observations, we further compared sequences 
flanking all the Q/S,G dipeptides contained in the FI and F2 
polyproteins to those surrounding the 4 tentatively identified 
cleavage sites (Fig. 6). Thus, 9 additional putotive cleavage 
sites bearing some resemblance to the first 4 were identified 
(Fig. 6). A notable feature of all the 13 detected sites is the 
presence of a hydrophobic residue (mostly L) in position -1 
which is thought to be most important for cleavage by 3C-like 
proteases (31). Also of interest are peculiarities of sites 3 
and 4 flanking the putative 3C-like protease (F in position *3 
and a positively charged residue in position -3) shared by site 

2. It is tempting to speculate that these may be specific 
requirements for intramolecular cleavage. Some of the sequences 
shown in Fig. 6 bear additional similarities to each other (for 
example, sites 12 and 13), emphasizing the case for their 
authenticity. Finally, a striking resemblance is observed 
between some of the putative cleavage sites of IBV (especially 
sites 1, 2 and 4) and the consensus (VRFQ/S,G) deri.ved for the 
polyprotein cleavage sitea of one of the potyviruaea, TVHV (20). 
However, contrary to what ia observed in potyviruaea (34,35), 
the C-flanking aequencea of the putative coronavlrua cleavage 
•itea are alao somewhat similar to each other (Fig. 6) and, by 
implication, night be important for processing. 
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Fig.7. A scheie depicting possible organization of functional 
doaains in the non-structural polyprotein of IBV. 

FI and F2 polypeptides are shown to scale. Filled circles: 
putative cleavage sites numbered as in Fig.6; empty circles: 
Q/G,S dipeptides whose flanking sequences bear no significant 
resemblance to those of the reference set (Fig.6) and which are 
thought to be not utilized. SPL, putative protease similar to 
the S. pneumoniae protease; ZnO, putative Zn2*-binding 
domain. Regions of significant sequence similarity to 
respective viral or cellular proteins are shown in black. Other 
designations as in Fig.6. 

-Implications for coronavirus polvprotein organization and 

expression 

Fig. 7 schematically summarizes what could be derived from 
amino acid sequence of the coronavirus non-structural 
polyprotein(s) by computer-assisted comparisons. The approx. 
6600 amino residue polyprotein (provided FI and F2 are actually 
joined by translation frame-shift) may be provisionally 
separated into two vast regions, the N-terminal one of about 
2600 residues, and the approx. 4000 residue C-terminal one. The 
C-terminal part encompasses the putative cleavage sites for the 
3C-like protease which are predicted to be cleaved (see above) 
and it is tempting to suggest that expression of this region of 
the polyprotein night be completely controlled by the 3C-like 
proteose. In principle, cleavage at sites shown in Fig. 6 
night be sufficient for the generation of all the proteins of 
IBV essential for genome replication and expression. 
Organization of the complex of these proteins (domains) is 
principally similar to that observed in other positive strand 
RNA viruses but certain interesting unique features ore also 
present. Specifically, the putative 3C-like proteose may be 
flanked by two relatively small proteins having long N-terminal 
stretches of hydrophobic amino acid residues, presumably 
membrane-spanning domains, which might influence the 
intracellular topology of the protease. Localization of the 
3C-like proteose relative to the polymerase is also typical of 
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other viruses having enzymes of this family, though a large 
domain of unknown function is inserted between the two 
conserved domains. This domain contains the EGF-like sequence 
unique for IBV. The beet guess concerning the function of this 
domain is that it might be involved in some special kind of 
protein-protein interaction. In fact, it is not clear what may 
be the N-terminus of the polymerase, the size of this enzyme 
varying to a large extent among positive strand RNA viruses 
<cf. Fig.l and refs. 16,22). Still, site 9 separating the 
EGF-like sequence from the putative polymerase and bearing some 
resemblance to site 10 (Fig. 6) is the most plausible candidate. 

What is unusual in the organization of the putative 
complex of replication proteins of IBV, is the mutual 
orientation of the polymerase and helicase domains which is 
conversed as compared to that observed in other positive strand 
RNA viruses (6). An interesting possibility is that this array 
could arise as a result of a recombinational event, high 
frequency of recombination being a salient feature of 
coronavirus reproduction Cl-3). Another unique feature of 
coronaviruses is the presence of the "finger" domain which, 
provided the cleavage sites are determined correctly, may be 
the N-terminal part of the helicase (Fig. 7>. It has recently 
been demonstrated that small finger proteins of retroviruses 
possess RNA annealing activity and mediate positioning of the 
replicative primer (a specific tRNA) on the viral genome (36). 

A similar role in the priner-dependent transcription and 
recombination of the coronavirus genome (cf. 3) is plausible 
for the finger domain of IBV. These functions might be 
performed in conjunction with the helicase domain. A putative 
single-finger domain has been identified also in the N-terminal 
portion of the polyprotein of another coronavirus, MHV (37) . No 
obvious similarity between this region and any IBV sequence 
could be revealed. Evaluation of the significance of this 
observation awaits complete sequencing of the NHV genome. 

In the N-terminal portion of the polyprotein, the only 
domain for which a function could be proposed is the putative 
2nd thiol protease. Possibly, it may control processing of this 
region whose pathway as well as functions of the products 
remain obscure. A possible exception is a 440 residue domain at 
the very N-terminus of FI flanked by one of the putative 
cleavage sites of the 3C-like protease (Fig. 7). Some sequence 
similarity has been detected between a portion of this region 
and the replication initiation protein of the R6K plasmid (4, 
38). In the absence of any data about functional sites of the 
latter, it is, however, difficult to assess the significance of 
this observation. 

Generally, the identification, in IBV, of putative homologs 
of three conserved domains of positive strand RNA viruses 
suggests an evolutionary relationship between coronaviruses and 
other groups of this class. On the other hand, the unique 
biochemistry of coronaviruses seems to be reflected in the 
unusual arrangement of these domains, and in the presence of 
additional specific ones. 
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CONCLUDING REMARKS 

The findings reported here may prove important in several 
dimensions. First, and most obvious, one may hope that the 
predictions made might canelyze studies directed on 
experimental dissection of coronavirus non-etructural 
polyprotein(a). Second, the putative polymerase, helicase and 
3C-like protease of IBV, while related to the similar enzymes 
of other positive strand RNA viruses at a statistically 
significant level, loosened the respective consensus patterns, 
thus providing a new groundwork for probing newly sequenced 
viral genomes. Finally, the general approach utilized also 
might be helpful in analysis of other genomes. 
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